Search CORE

830 research outputs found

Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning

Author: Albrecht Stefano V
Christianos Filippos
Schäfer Lukas
Publication venue
Publication date: 06/12/2020
Field of study

Exploration in multi-agent reinforcement learning is a challenging problem, especially in environments with sparse rewards. We propose a general method for efficient exploration by sharing experience amongst agents. Our proposed algorithm, called Shared Experience Actor-Critic (SEAC), applies experience sharing in an actor-critic framework. We evaluate SEAC in a collection of sparse-reward multi-agent environments and find that it consistently outperforms two baselines and two state-of-the-art algorithms by learning in fewer steps and converging to higher returns. In some harder environments, experience sharing makes the difference between learning to solve the task and not learning at all.Comment: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canad

arXiv.org e-Print Archive

Edinburgh Research Explorer

Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning

Author: Albrecht Stefano V.
Christianos Filippos
Papoudakis Georgios
Publication venue
Publication date: 22/07/2023
Field of study

This work focuses on equilibrium selection in no-conflict multi-agent games, where we specifically study the problem of selecting a Pareto-optimal equilibrium among several existing equilibria. It has been shown that many state-of-the-art multi-agent reinforcement learning (MARL) algorithms are prone to converging to Pareto-dominated equilibria due to the uncertainty each agent has about the policy of the other agents during training. To address sub-optimal equilibrium selection, we propose Pareto Actor-Critic (Pareto-AC), which is an actor-critic algorithm that utilises a simple property of no-conflict games (a superset of cooperative games): the Pareto-optimal equilibrium in a no-conflict game maximises the returns of all agents and therefore is the preferred outcome for all agents. We evaluate Pareto-AC in a diverse set of multi-agent games and show that it converges to higher episodic returns compared to seven state-of-the-art MARL algorithms and that it successfully converges to a Pareto-optimal equilibrium in a range of matrix games. Finally, we propose PACDCG, a graph neural network extension of Pareto-AC which is shown to efficiently scale in games with a large number of agents.Comment: 20 pages, 12 figure

arXiv.org e-Print Archive